Sprinkling Topics for Weakly Supervised Text Classification

نویسندگان

  • Swapnil Hingmire
  • Sutanu Chakraborti
چکیده

Supervised text classification algorithms require a large number of documents labeled by humans, that involve a laborintensive and time consuming process. In this paper, we propose a weakly supervised algorithm in which supervision comes in the form of labeling of Latent Dirichlet Allocation (LDA) topics. We then use this weak supervision to “sprinkle” artificial words to the training documents to identify topics in accordance with the underlying class structure of the corpus based on the higher order word associations. We evaluate this approach to improve performance of text classification on three real world datasets.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Topics and Label Propagation: Best of Both Worlds for Weakly Supervised Text Classification

We propose a Label Propagation based algorithm for weakly supervised text classification. We construct a graph where each document is represented by a node and edge weights represent similarities among the documents. Additionally, we discover underlying topics using Latent Dirichlet Allocation (LDA) and enrich the document graph by including the topics in the form of additional nodes. The edge ...

متن کامل

Semi-supervised Learning with Weakly-Related Unlabeled Data: Towards Better Text Categorization

The cluster assumption is exploited by most semi-supervised learning (SSL) methods. However, if the unlabeled data is merely weakly related to the target classes, it becomes questionable whether driving the decision boundary to the low density regions of the unlabeled data will help the classification. In such case, the cluster assumption may not be valid; and consequently how to leverage this ...

متن کامل

Supervised Latent Semantic Indexing Using Adaptive Sprinkling

Latent Semantic Indexing (LSI) has been shown to be effective in recovering from synonymy and polysemy in text retrieval applications. However, since LSI ignores class labels of training documents, LSI generated representations are not as effective in classification tasks. To address this limitation, a process called ‘sprinkling’ is presented. Sprinkling is a simple extension of LSI based on au...

متن کامل

Incremental Learning on Sentiment Analysis Using Weakly Supervised Learning Techniques

Due to the advanced technologies of Web 2.0, people are participating in and exchanging opinions through social media sites such as Web forums and Weblogs etc., Classification and Analysis of such opinions and sentiment information is potentially important for both service and product providers, users because this analysis is used for making valuable decisions. Sentiment is expressed differentl...

متن کامل

Weakly Supervised Learning for Hedge Classification in Scientific Literature

We investigate automatic classification of speculative language (‘hedging’), in biomedical text using weakly supervised machine learning. Our contributions include a precise description of the task with annotation guidelines, analysis and discussion, a probabilistic weakly supervised learning model, and experimental evaluation of the methods presented. We show that hedge classification is feasi...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014